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ABSTRACT 

We introduce ANNz, a freely available software package for photometric redshift estimation using Artificial 
Neural Networks. ANNz learns the relation between photometry and redshift from an appropriate training set 
of galaxies for which the redshift is already known. Where a large and representative training set is available 
ANNz is a highly competitive tool when compared with traditional template-fitting methods. 
The ANNz package is demonstrated on the Sloan Digital Sky Survey Data Release 1, and for this particular data 
set the r.m.s. redshift error in the range < z < 0.7 is er rms = 0.023. Non-ideal conditions (spectroscopic sets 
which are small, or which are brighter than the photometric set for which redshifts are required) are simulated 
and the impact on the photometric redshift accuracy assessed. 

The package may be freely downloaded from http : / / www . ast . cam . ac . uk/~aac. 
Subject headings: surveys — galaxies: distances and redshifts — methods: data analysis 



1. INTRODUCTION 

In its most general sense, the term photometric redshift 
refers to a redshift estimated using only medium- or broad- 
band photometry or imaging. Most commonly, photometric 
redshifts are determined on the basis of galaxies' colours in 
three or more filters (thus giving a very coarse approxima- 
tion to the spectral energy distribution, hereafter SED), but 
they could also be based on other properties which can be de- 
rived from images, such as the angular size or concentration 
index. The method has found successful application to deep- 
field and_wjd£ ; fj£jxl_sury£y_s J j!^ Hubble Deep Field 
(e.g. lFernandez-So to 1 Lanzetta 1 & Y ahill 19991) . and the Sloan 
Digital Skv Survey (ICsabai et all2003l) . 

The most commonly used approach to photometric redshift 
estimation is the template-matching technique. This requires 
a set of 'template' SEDs covering a range of galaxy types, 
luminosities and redshifts appropriate to the population for 
which photometric redshifts are required. For a particular tar- 
get galaxy, the photometric redshift is chosen to be the red- 
shift of the most closely matching template spectrum; this is 
usually defined as the template which minimizes the \ 2 be- 
tween the template and actual magnitudes. 

The template spectra are usually derived from a small set 
of SEDs representing different classes of galaxy at redshift 
z = 0, which are then manually redshifted to give a dis- 
crete sampling along the redshift axis (note that this method 
does not account for ev olution with redshift). Com monly 
used t emplate sets are the lColeman. Wu. & WeedmanK CWW: 
1980) SE Ds which are d erived observationally, or those of 
Bruzual & Chariot ( 1993), derived from population synthesis 
models. The template-matching technique owes its popularity 
to the very few resources required for a basic implementation 
(i.e. a handful of template SEDs), but the accuracy of the tech- 
nique strongly depends on the extent to which the template 
spectra are representative of the target populations: for exam- 
ple, template SEDs derived from observations of low-redshift 
galaxy populations may be a poor match for populations at 
higher redshifts. 
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The chances of success can be improved by increasing the 
number of templates, or by more carefully matching the tem- 
plates to the populations being studied. For example, the spec- 
troscopic c atalogue of the Sloan Digital Sky Survey (SDSS; 
lYork et aT1l2000l) could be used to produce a set of templates 
which are very well represent ative of the SDSS photometric 
catalogue (Csabaietal. 2003). However, in situations with 
such a large amount of prior redshift information about the 
sample, the template-matching technique is not the best ap- 
proach: so-called empirical methods usually offer greater ac- 
curacy, as well as being far more efficient. 

In essence, empirical photometric redshift methods aim to 
derive a parametrization for the redshift as a function of the 
photometric parameters. The form of this parametrization is 
deduced through use of a suitably large and representative 
training set of galaxies for which we have both photometry 
and a precisely known redshift. A simple example is to ex- 
press the redshift as a polynomial in the galaxy co lours (e.g. 
Conn olly et aDll995t ISowards-Emmerd et al.ll2000l) . The co- 
efficients in the polynomial are varied to optimize the fit be- 
tween the predicted and measured redshift. The photometric 
redshift for the galaxies for which we have no spectroscopy 
can then be estimated by applying the optimized function to 
the colours of the target galaxy. 

Ideally the training set would be a representative subset 
of the actual photometric target sample (this has the attrac- 
tive side-effect of nullifying any systematics in the photom- 
etry). However, the training set could also be derived from 
a set of template spe ctra or from simulated catalogues (e.g. 
IVanzella et al.l 12003). The photometry for the training set 
must be for the same filter set and should have the same 
noise characteristics as that for the target sample. The trained 
method can usually only be reliably applied to target galax- 
ies within the ranges of redshift and spectral type adequately 
sampled by the training set. 

In this paper we introduce ANNz, a software package 
for photometric redshift estimation using Artificial Neu- 
ral Networks (hereafter ANNs) to parametrize the redshift- 
photometry rela tion. It can be shown (e.g. LFonesI 1990; 
Blum & Li 1991) that a sufficiently complex ANN is capable 
of approximating to arbitrary accuracy any continuous func- 
tional mapping. ANNs have previously found a number of 
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nput layer — > Hidden layer —> Output layer 




FIG. 1 . — A schematic diagram of a multi-layer perception, as implemented 
by ANNz, with input nodes taking, for example, magnitudes m, = —2.5 log 10 /, 
in various filters, a single hidden layer, and a single output node giving, for 
example, redshift z. The architecture is n.p:\ in the notation used in this 
paper. Each connecting line carries a weight ivy . The bias node allows for an 
additive constant in the network function defined at each node. More complex 
networks can have additional hidden layers and/or outputs. 



applications in astronomy, including morphological classifi- 
cation of galaxies (e.g. Laha^^y[199^ [B all et alJl2003l) 
star/galaxy separation ( Bertin & Arnouts 1996) and object de- 
tection (e.g. Andreon et al. 2000). Firth, Lahav, & Somerville 
d2003l) previously demonstrated the feasibility of using 
ANNs for pho tometric redshift estimation, and more recently 
IVanzella et alJ {2003) have applied the method to the Hubble 
Deep Fields. 

The layout of this paper is as follows. In ^Artificial Neu- 
ral Networks are introduced, and the particular methods used 
by ANNz are explained. In |3]annz lS applied to the Sloan 
Digital Sky Survey. The results are compared with rival pho- 
tometric redshift estimators and various extensions to the ba- 
sic technique are explained and illustrated. Finally, less ideal 
conditions are simulated to assess the impact on the accuracy 
of photometric redshift estimation. In 50]the results are sum- 
marised, and prospects for the application of ANNz discussed. 

2. ARTIFICIAL NEURAL NETWORKS 

ANNz uses a particular species of ANN known formally as 
a multi-layer perceptron (MLP). A MLP consists of a num- 
ber of layers of nodes (Fig. see e.g. Bishop| ll995L and 
references therein, for background). The first layer contains 
the inputs, which in our application to photometric redshift 
estimation are the magnitudes, m,, of a galaxy in a number 
of filters (for ease of notation we arrange these in a vector 
m = (m 1 ,rri2, ...,m n )). The final layer contains the outputs; we 
will usu ally use just one output, the photometric redshift z p hoti 
but see 33.2.01 for an example with multiple outputs. Inter- 
vening layers are described as hidden and there is complete 
freedom over the number and size of hidden layers used. The 
nodes in a given layer are connected to all the nodes in adja- 
cent layers. A particular network architecture may be denoted 
by Nin'.Ni :Ni'. ■ ■ ■ -N on t where A^ n is the number of input nodes, 
A^i is the number of nodes in the first hidden layer, and so on. 
For example 9:6:1 takes 9 inputs, has 6 nodes in a single hid- 
den layer and gives a single output. 

Each connection carries a weight, tvy; these comprise the 
vector of coefficients, w, which are to be optimized. An ac- 
tivation function, gj(uj), is defined at each node, taking as its 



argument 

Uj = ^2w i jg i (u i ), (1) 

i 

where the sum is over all nodes i sending connections to node 
j. The activation functions are typically taken (in analogy to 
biological neurons) to be sigmoid functions such as gj(uj) = 
1 /[l +exp(-Mj)], and we follow this approach here. An extra 
input node - the bias node - is automatically included to allow 
for additive constants in these functions. 

For a particular input vector, the output vector of the net- 
work is determined by progressing sequentially through the 
network layers, from inputs to outputs, calculating the activa- 
tion of each node (hence this type of neural network is often 
referred to as a feed-forward network). 

2.1. Network training 

Given a suitable training set of galaxies for which we have 
both photometry, m, and a spectroscopic redshift, z spe c, the 
ANN is trained by minimizing the cost function 

E = ^(z P hot(w,m^)-z spec ^) 2 , (2) 

k 

with respect to the weights, w, where z p hot(w,m J i) is the net- 
work output for the given input and weight vectors, and the 
sum is over the galaxies in the training set. To ensure that 
the weights are regularized (i.e. that they do not become too 
large), an extra quadratic cost term 

•J 

is added to equation[2] 

ANNz uses an iterative quasi-Newton method to perform 
this minimization. Details of t he minimizatio n algorit hm and 
regula rization may be found in Bishop ( 1995) and lLahav et alJ 
( 1996, Appendices). 

After each training iteration, the cost function is also eval- 
uated on a separate validation set. After a chosen number of 
training iterations, training terminates and the final weights 
chosen for the ANN are those from the iteration at which the 
cost function is minimal on the validation set. This is useful to 
avoid over-fitting to the training set if the training set is small. 
The trained network may then be presented with previously 
unseen input vectors, and the outputs computed. 

2.2. Photometric noise 

In real situations the inputs to the network (e.g. the mag- 
nitudes in this case of photometric redshift estimation) will 
usually have a measurement noise associated with them. We 
can assess the variance these errors effect in the output using 
the usual chain-rule approach: 

i 

where the sum is over the network inputs. 

Given a trained network, the output is an analytic function 
of the network weights and the input vector: z = z(w,m). Pro- 
vided the activation functions, gi(ui), are differentiable, the 
derivatives dz/d mj can be obta ined through a simple and ef- 
ficient algorithm (Bishop 1995, pp. 148-150). This method 
is used by ANNz to estimate the variance in its photometric 
redshifts due to the photometric noise. 
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2.3. Network variance 

Prior to training, ANNz randomizes the initial values of the 
weights. Depending on the particular initialization state used, 
the training process will usually converge to different local 
minima of the cost function. A simple possibility is to train 
a number of networks and select one based on the best per- 
formance on the validation set. However, this is wasteful of 
training effort and, in fact, the sub-optimal networks can be 
used to improve overall accuracy: the mean of the individual 
outputs of a group of networks (known as a committee) will 
usually be a more accurate estimate for the true redshift than 
the outputs of any one committee member in isolation. 

Using a committee also allows the uncertainty in the output 
due to the variance in the network weights to be estimated. 
For a particular target galaxy the photometric redshift predic- 
tion should ideally be robust to different intializations of the 
weight vector. However, it may be the case that the available 
photometry or training set does not constrain the redshift very 
well (even for high signal-to -nois e photometry, so the error 
estimated by the method of 32.21 could be relatively small). 
These cases are more likely to show a large variance in the 
output for different initializations of the weight vector, hence 
using a committee may assist in their identification. ANNz al- 
lows arbitrarily large committees to be used, and estimates the 
contribution of the network variance to the error in the photo- 
metric redshift for each target galaxy. 

2.4. Using the ANNz package 

We have made ANNz available on the WWW at the fol- 
lowing address: http : / / www . ast .cam. ac.uk/~aac. 
Full instructions are provided with the package, but we pro- 
vide an outline of the procedure here. ANNz comprises two 
main programs, annz_train and annz_test. 

1 — When applying ANNz to any data set for the first time it is 
strongly recommended that a portion of the available training 
data be set aside as an evaluation set. This is used as a mock 
target sample to assess and tune ANNz's performance on the 
data. The evaluation set should therefore be chosen to match 
the real target sample as closely as possible in terms of its 
magnitude and colour distributions. 

2 — The remaining training data should be separated into 
training and validation sets which are supplied to the 
annz_train program along with a description of the re- 
quired network architecture. Thi s pro gram performs the net- 
work training as described in 32.11 The trained network 
weights are saved to file. 

3 — Step 2 may be repeated several times using different net- 
work initialisations to obtain a committee of trained networks. 

4 — The annz_test program can now be used to apply the 
trained networks to the target data. 

Before applying ANNz to the actual photometric target sam- 
ple, the whole procedure should be run several times using the 
evaluation set as the target data, and varying the parameters 
of the training (e.g. weight decay, training and validation set 
sizes, number of networks in the committee) so as to optimize 
the performance. 

3. APPLICATION TO SDSS DATA 



TABLE 1. 
Photometric redshift 
accuracies for the sdss 
EDR 



Estimation Method 


<7 rms 


CWW 


0.0666 


Bruzual-Charlot 


0.0552 


Interpolated 


0.0451 


Polynomial 


0.0318 


Kd-tree 


0.0254 


ANNz 


0.0229 



NOTE. — The first five 
entries are the photometric 
redshif t accuracies obtained 
by ICsabai et alj |27)0H) for the 
SDSS Early Data Release. The 
result obtained using ANNj is 
appended for comparison. 



The Sloan Digital Sky Survey 2 (SDSS; lYork et alJl200(l 
combines a large, five-band (ygriz) imaging survey with a 
smaller spectroscopic follow-up survey. This is an ideal situ- 
ation for the application of ANNz since the spectroscopic sur- 
vey represents an excellent training set for the imaging survey. 

The selection algorithm for the SDSS spectroscopic survey 
results in two subsets of the data: a main galaxy catalogue 
and a luminous red galaxy catalogue (LRG; Eisenstein et al. 
1200 ll) . The main galaxy catalogue is a flux-limited sample 
(r < 17.77) with a median redshift z = 0.104 iStrauss et alJ 
2002), while the LRG catalogue is flux- and colour-selected to 
be a very uniform and approximately volume-limited sample 
(it is volume limited to z w 0.4, but probes out to z w 0.6 at 
lower completion). 

3.1. Comparison of ANNz with other techniques 

The SDSS consortium have themselves applied a range of 
photometric redshif t techniques to their commissioning data 
(Csabai et al. 2003). Tabled lists the estimation errors they 
obtained. This commissi oning data was made p ublic in the 
Early Data Release (EDR; Stought onet all20 02). In order to 
allow a direct compariso n of the accu racy of ANNz with the 
methods used by Csabai et al. (2003) we selected the main 
galaxy and LRG samples from the EDR. From these ~30,000 
galaxies we randomly selected training, validation and eval- 
uation sets with respective sizes 15,000, 5,000 and 10,000. 
The network inputs were the dereddened model magnitudes 
in each of the five filters and the overall architecture was 
5:10:10:1. A committee of five such networks was trained on 
the training and validation sets, then applied to the evaluation 
set. Figure |2] shows the ANNz photometric redshift against 
the spectroscopic value for each galaxy in the evaluation set. 
The rms deviation between these is <7 rms = yj ((z p hot - ^spec) 2 ) = 

2 Funding for the creation and distribution of the SDSS Archive has been 
provided by the Alfred P. Sloan Foundation, the Participating Institutions, the 
National Aeronautics and Space Administration, the National Science Foun- 
dation, the U.S. Department of Energy, the Japanese Monbukagakusho, and 
the Max Planck Society. The SDSS Web site is http : / / www .sdss.org/ 
The SDSS is managed by the Astrophysical Research Consortium (ARC) for 
the Participating Institutions. The Participating Institutions are The Univer- 
sity of Chicago, Fermilab, the Institute for Advanced Study, the Japan Par- 
ticipation Group, The Johns Hopkins University, Los Alamos National Lab- 
oratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck- 
Institute for Astrophysics (MPA), New Mexico State University, University 
of Pittsburgh, Princeton University, the United States Naval Observatory, and 
the University of Washington. 
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FIG. 2. — Spectroscopic vs. photometric redshifts for ANNz applied to 
10,000 galaxies randomly selected from the SDSS EDR. 
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FIG. 3. — A subset of 200 galaxies randomly selected from the results 
of Fig. [2] and with the error bars calculated by ANNz s how n. These are 
a combina tion of contributions from photometric noise ( 42.21 and network 
variance ( j|2.31 . 



0.0229, which compares well with the results in Tabled For 
clarity the estimated errors on the photometric redshifts are 
not shown in Fig. [2] The results for a randomly-selected sub- 
set of 200 galaxies are shown with errorbars in Figure[5] Due 
to the high quality of the training data in this case, network 
variance makes only a small contribution and the errors are 
therefore dominated by the photometric noise. 

Hyperz iBolzonella. Miralles. & Pellol l2000) is a widely 
used template-based photometric redshift package. In order 
to more directly compare ANNz with the template-matching 
method, HYPERZ was appli ed to the same evaluat ion set using 
the CWW template SEDs JColeman et alj fl980). It is clear 
from the results in Fig. 0]that not only is the rms dispersion 
in the photometric redshift considerably greater than that for 
ANNz, but there are also systematic deviations in the HYPERZ 
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FIG. 4. — Photometric redshift estimation using HYPERZ with the CWW 
template SEDs. This uses the same 10,000 galaxy sample as figure|2] There 
are obvious systematic deviations, with bands apparent above and below the 
£phot = £spec line. 



results. The SDSS consortium obtained similar accuracies to 
HYPERZ in their implementation of the basic template-fitting 
technique (the results labelled CWW and Bruzual-Charlot in 
Tableware for the respective template sets). With more so- 
phisticated template-based methods they were able to improve 
on these errors: the result labelled Interpolated was obtained 
by first tuning the templates using the spectroscopic sample 
as a training set, then producing a continuous range of tem- 
plates by interpolating between the tweaked SEDs. However, 
even "hybrid" methods such as this still do not match the ac- 
curacy achieved by the purely empirical methods (in the table 
these are: Polynomial, which uses a second-order polynomial 
as the fitting function, and Kd-tree, in which the training set is 
partitioned in colour-space and a separate second-order poly- 
nomial is fitted in each cell). 

3.2. Extensions to the basic method 

In this section more advanced use of ANNz is demonstrated. 
These examples use the LRG and main galaxy data from 
the SDSS Data Release 1 (DRk lAbazaiian et alj f2003). split 
into training, validation and evaluation sets of respective sizes 
50,000, 10,000 and 64,175. For these data the photometric 
redshift accuracy o n the evaluation set when using the same 
basic method as in §3. II was <7 rms = 0.0238. 

Using additional inputs 

One of the great advantages of empirical photometric red- 
shift methods is the ease with which we can introduce addi- 
tional observables into our parametrization of the photometric 
redshift. This is particularly true for ANNz; we simply add an 
extra input to our network architecture for each new parameter 
we wish to consider. ANNz treats these new inputs in exactly 
the same way as it does the galaxy magnitudes. 

If the additional inputs contain useful information then the 
ANN will use this to improve the accuracy of its predictions. 
However, increasing the number of inputs to the ANN gener- 
ally leads to a reduction in the generalization capabilities of 
the network (that is, its ability to make predictions for data on 
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which it has not been trained). Thus, the inputs should be cho- 
sen carefully as non-informative inputs may actually lead to a 
worsened ANN performance: due to the increased dimension- 
ality of the input space, larger training sets may be required 
and there will be an increased likelihood of converging to a 
local rather than the global minimum. 

By way of example, the r-band 50 and 90 per cent Pet- 
rosian flux radii were added as two extra inputs to our ANN. 
These are the angular radii (concentric with the galaxy bright- 
ness distribution) containing the stated fraction of the Pet- 
rosian flux, and therefore contain information on the angular 
size of the galaxy (clearly a strongly distance-dependent prop- 
erty) and the concentration index (essentially the steepness 
of the galaxy brightness profile, which may help break de- 
generacies in the redshift-colour relationship). Running this 
extended data set through ANNz (using a committee of five 
7:11:11:1 networks) produced a redshift estimation accuracy 
of CTrms = 0.0230, an improvement of ^3 per cent compared to 
the results based only on the magnitudes. In this example the 
improvement is small (mainly because the training sample al- 
ready provided excellent redshift information), but it demon- 
strates well how straightforwardly the extra information could 
be included for consideration by ANNz. 

Predicting spectral type 

It is equally straightforward to train ANNz to make pre- 
dictions for properties other than the redshift. Template- 
matching photometric redshift techniques have the useful side 
effect of assigning an estimated spectral t ype to each galaxy , 
in addition to estimating the redshift. Firth et al. (2003) 
demonstrated the use of ANNs to determine spectral types 
from broad-band photometry. 

The spectroscopic catalogue of the SDSS includes a con- 
tinuous parameter (eClass) indicating spectral type which 
ranges from approximately -0.5 (early types) to 1 (late types). 
A 5:10:10:2 network architecture was used to attempt the si- 
multaneous estimation of redshift and e C 1 a s s from the pho- 
tometry. The accuracy of the redshift estimation was very 
slightly poorer, er rms = 0.0241. The eClass was determined 
with an rms error of er rms = 0.0516 (Fig. |5} 

3.3. More realistic conditions 

Our example applications to the SDSS above are somewhat 
idealistic, since we are training and testing on samples with 
identical redshift, magnitude and galaxy species distributions. 
Furthermore, our training samples have thus far been very 
large. In this section less optimal training sets are used to 
investigate their impact on the photometric redshift accuracy. 

Smaller training sets 

The size of training sample needed will be strongly depen- 
dent on the range of redshifts and galaxy types in the target 
sample. The same evaluation set of 64,175 galaxies was sub- 
mitted to networks trained on randomly selected samples of 
(i) 2000 galaxies and (ii) 200 galaxies. In both cases these 
samples were split equally into the training and validation 
sets. Committees of five 5:10:10:1 networks were used. 

The photometric redshift accuracies were respectively (i) 
Crms = 0.0263 and (ii) = 0.0343. In the first case the loss 
of accuracy is small, while the second case demonstrates well 
the problems associated with small training sets. The rarer 
classes of object in the target sample (e.g. here, those at high 
redshift) feature very sparsely (if at all) in the training set and 
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FIG. 5. — Results from using ANNz to predict the spectral type (in the 
form of the eClass parameter) simultaneously with the redshift for 64,175 
galaxies from the SDSS Data Release 1. 



so the network is unable to sensibly deal with these objects 
when they appear in the testing data. This leads to an in- 
creased number of outliers and, potentially, the introduction 
of systematic errors. 

Biased training sets 

For increasingly faint targets, acquiring good spectroscopy 
becomes increasingly difficult and eventually prohibitively 
expensive; this problem is the primary motivation for pho- 
tometric redshifts. In practice then, the available spectro- 
scopic training sample is likely to be somewhat brighter on 
average than the photometric target sets. However, the ma- 
jor stumbling block for empirical photometric redshift esti- 
mation techniques is the difficulty in applying them outside 
of the regions of parameter space which are well sampled by 
the training data: while the estimator ought to be able to in- 
terpolate within the training regime, extrapolating beyond is 
much more problematic. Ideally we would like to be able to 
train our estimator on bright galaxies, and then confidently 
apply it to faint galaxies. 

We can improve the ANN's prospects by careful pre- 
selection of the data set. The Luminous Red Galaxies are 
a very uniform sample with respect to spectral types, since 
these early-type galaxies show little spectral evolution with 
redshift; this might be expected to make extrapolation a more 
manageable task. To assess the effectiveness of ANNz in this 
situation the LRG sample was split roughly in half by impos- 
ing a magnitude cut at r = 18.5. The brighter subsample was 
further divided at random into training and validation sets of 
size 5000 and 2000 galaxies respectively. A committee of five 
5:10:10:1 networks was trained on this data and then applied 
to the remaining ~ 6000 LRGs (for which the limiting mag- 
nitude is eft! 19.6). 

The results are shown in Fig. [6] The overall dispersion is 
Crms = 0.0327 which represents only a slight loss of accuracy 
when compared with results using a LRG training set selected 
over all magnitudes (cr rms = 0.0294). Thus, in this particular 
case, ANNz is able to extrapolate with some success to around 
a magnitude fainter than is sampled by the training data. 
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FIG. 6. — Results from training networks on LRGs with r < 18.5, but 
applied to LRGs with strictly r > 18.5 (note the change of intercept of the 
axes). The limiting magnitude for the LRGs is r ~ 19.6. 



4. CONCLUSIONS 

In appropriate circumstances, ANNz is a highly competi- 
tive tool for photometric redshift estimation. However, it 
does rely on the existence of a sufficiently large training 
set which is representative of the particular populations be- 
ing studied. The package's utility therefore lies particularly 
with large photometric surveys such as the SDSS, GOODS 
i Dickinson etafl 120011) or the VIRMOS-VLT Deep Survey 
i LeF^vreetld] l2003). some of which include spectroscopic 
surveys for subsets of the photometric catalogues (for exam- 
ple, of the eventual 100 million photometric objects which 



the SDSS expect to catalogue, 1 million will also have spec- 
troscopy, and hence accurate redshifts). 

A major problem for empirical photometric redshift esti- 
mators is the difficulty in extrapolating to regions of the input 
parameter space which are not well sampled by the training 
data. Care should be taken to match the training data to the 
target sample as closely as possible in terms of the magnitude 
and colour distributions of each. Use of an evaluation set is 
essential when applying ANNz to a new data set: the good 
performance demonstrated here on the SDSS data cannot be 
guaranteed on different data sets. 

A potential solution to the problem of obtaining training 
sets when spectroscopy is difficult to obtai n is to use simu - 
lated catalogues as training data (e.g. Vanzella et al. 2003). 
Since this requires the use of theoretical SEDs it introduces 
the disadvantages of the template-based methods, such as the 
need for precise calibration. However, the ANN approach 
has advantages over standard template-matching: simulated 
catalogues can contain galaxies representing a large range of 
complex star formation histories, dust extinction models and 
metallicities etc., giving fully Bayesian statistics, and ANNs 
allow much more flexible weighting to be applied to the fil- 
ters than is possible with the simple x 2 -weighting of standard 
template-matching. 
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